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Ph ' Abstract 

We describe the new version (v2.49t) of the code hfodd which solves the nuclear Skyrme 
<N . Hartree-Fock (HF) or Skyrme Hartree-Fock-Bogolyubov (HFB) problem by using the Cartesian 
deformed harmonic-oscillator basis. In the new version, we have implemented the following 
10 \ physics features: (i) the isospin mixing and projection, (ii) the finite temperature formalism 
for the HFB and HF+BCS methods, (iii) the Lipkin translational energy correction method, 
(iv) the calculation of the shell correction. A number of specific numerical methods have also 
O ■ been implemented in order to deal with large-scale multi-constraint calculations and hardware 
limitations: (i) the two-basis method for the HFB method, (ii) the Augmented Lagrangian 
Method (ALM) for multi-constraint calculations, (iii) the linear constraint method based on 
the approximation of the RPA matrix for multi-constraint calculations, (iv) an interface with 
^ ■ the axial and parity-conserving Skyrme-HFB code hfbtho, (v) the mixing of the HF or HFB 
. matrix elements instead of the HF fields. Special care has been paid to using the code on 
massively parallel leadership class computers. For this purpose, the following features are now 
available with this version: (i) the Message Passing Interface (MPI) framework (ii) scalable 
input data routines (iii) multi-threading via OpenMP pragmas (iv) parallel diagonalization of 
the HFB matrix in the simplex breaking case using the ScaLAPACK library. Finally, several 
little significant errors of the previous published version were corrected. 

PACS numbers: 07.05.T, 21.60.-n, 21.60.Jz 

NEW VERSION PROGRAM SUMMARY 

Title of the program: HFODD (v2.49t) 
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Catalogue number: 



Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland (see 
application form in this issue) 

Reference in CPC for earlier version of program: J. Dobaczewski, W. Satula, B.G. Carlsson, 
J. Engel, P. Olbratowski, P. Powalowski, M. Sadziak, J. Sarich, N. Schunck, A. Staszczak, M. 
Stoitsov, M. Zalewski, H. Zdunczuk, Comput. Phys. Commun. 180 (2009) 2361 (v2.40h). 

Catalogue number of previous version: ADFL_v2_l 

Licensing provisions: GPL v3 

Does the new version supersede the previous one: yes 

Computers on which the program has been tested: Intel Pentium-Ill, Intel Xeon, AMD- Athlon, 
AMD-Opteron, Cray XT4, Cray XT5 

Operating systems: UNIX, LINUX, Windows'^P 

Programming language used: FORTRAN-90 

Memory required to execute with typical data: 10 Mwords 

No. of bits in a word: The code is written in single-precision for the use on a 64-bit processor. 
The compiler option -r8 or +autodblpad (or equivalent) has to be used to promote all real 
and complex single-precision floating-point items to double precision when the code is used on 
a 32-bit machine. 

Has the code been vectorised?: Yes 
Has the code been parallelized?: Yes 

No. of lines in distributed program: 104 666 (of which 47 059 are comments and separators) 

Keywords: Hartree-Fock; Hartree-Fock-Bogolyubov; Skyrme interaction; Self-consistent mean 
field; Nuclear many-body problem; Superdeformation; Quadrupole deformation; Octupole de- 
formation; Pairing; Nuclear radii; Single-particle spectra; Nuclear rotation; High-spin states; 
Moments of inertia; Level crossings; Harmonic oscillator; Coulomb field; Pairing; Point sym- 
metries; Yukawa interaction; Angular-momentum projection; Generator Coordinate Method; 
Schiff moments; Isospin mixing; Isospin projection. Finite temperature; Shell correction; Lipkin 
method; Multi-threading; Hybrid programming model; High-performance computing. 

Nature of physical problem 

The nuclear mean field and an analysis of its symmetries in realistic cases are the main in- 
gredients of a description of nuclear states. Within the Local Density Approximation, or for a 
zero-range velocity-dependent Skyrme interaction, the nuclear mean field is local and velocity de- 
pendent. The locality allows for an effective and fast solution of the self- consistent Hartree-Fock 
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equations, even for heavy nuclei, and for various nucleonic (n-particle n-hole) configurations, 

deformations, excitation energies, or angular momenta. Similarly, Local Density Approximation 
in the particle-particle channel, which is equivalent to using a zero-range interaction, allows for 
a simple implementation of pairing effects within the Hartree-Fock-Bogolyubov method. 

Method of solution 

The program uses the Cartesian harmonic oscillator basis to expand single-particle or single- 
quasiparticle wave functions of neutrons and protons interacting by means of the Skyrme effective 
interaction and zero-range pairing interaction. The expansion coefficients are determined by 
the iterative diagonalization of the mean-field Hamiltonians or Routhians which depend non- 
linearly on the local neutron and proton densities. Suitable constraints are used to obtain states 
corresponding to a given configuration, deformation or angular momentum. The method of 
solution has been presented in: J. Dobaczewski and J. Dudek, Comput. Phys. Commun. 102 
(1997) 166. 

Summary of revisions 

1. Isospin mixing and projection of the HF states has been implemented. 

2. The finite-temperature formalism for the HFB equations has been implemented. 

3. The Lipkin translational energy correction method has been implemented. 

4. Calculation of the shell correction has been implemented. 

5. The two-basis method for the solution to the HFB equations has been implemented. 

6. The Augmented Lagrangian Method (ALM) for calculations with multiple constraints has 
been implemented. 

7. The linear constraint method based on the cranking approximation of the RPA matrix 
has been implemented. 

8. An interface between hfodd and the axially-symmetric and parity-conserving code HF- 
BTHO has been implemented. 

9. The mixing of the matrix elements of the HF or HFB matrix has been implemented. 

10. A parallel interface using the MPl library has been implemented. 

11. A scalable model for reading input data has been implemented. 

12. OpenMP pragmas have been implemented in three subroutines. 

13. The diagonahzation of the HFB matrix in the simplex-breaking case has been parallehzed 
using the ScaLAPACK library. 

14. Several little significant errors of the previous published version were corrected. 

Restrictions on the complexity of the problem 
Typical running time 

Unusual features of the program 

The user must have access to (i) the NAGLIB subroutine f02axe, or LAPACK subroutines 
ZHPEV, ZHPEVX, ZHEEVR, Or ZHEEVD, which diagonahze complex hermitian matrices, (ii) the 
LAPACK subroutines dgetri and dgetrf which invert arbitrary real matrices, (ill) the LA- 
PACK subroutines DSYEVD, DSYTRF and DSYTRI which compute eigenvalues and eigenfunctions 
of real symmetric matrices and (iv) the LINPACK subroutines ZGEDI and ZGECO, which invert 
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arbitrary complex matrices and calculate determinants, (v) the BLAS routines dcopy, dscal, 
DGEEM and DGEMV for double-precision linear algebra and ZCOPY, zdscal, zgeem and zgemv 
for complex linear algebra, or provide another set of subroutines that can perform such tasks. 
The BLAS and LAPACK subroutines can be obtained from the Netlib Repository at the Uni- 
versity of Tennessee, Knoxville: http : //netlib2 . cs . utk . edu/. 

LONG WRITE-UP 



1 Introduction 

The method of solving the Hartree-Fock (HF) equations in the Cartesian harmonic oscilla- 
tor (HO) basis was described in the publication, Ref. [Ij. Five versions of the code hfodd 
were previously published: (vl.60r) p],(vl.75r) [3], (v2.08i) [1], (v2.08k) |5], and (v2.40h) |6]. 
The User's Guide for version (v2.40v) is available in Ref. [7] and the code home page is at 
http : //www. fuw. edu.pl/~dobaczew/hf odd/hf odd. html. The present paper is a long write-up 
of the new version (v2.49t) of the code hfodd. This extended version features the isospin mixing 
and projection of the HF states, the finite-temperature formalism for the HF+BCS and HFB 
equations, and several other major modifications. It is also built upon a hybrid MPI/OpenMP 
parallel programing model which allows large-scale calculations on massively parallel computers. 
In serial mode, it remains fully compatible with all previous versions. Information provided in 
previous publications [2]-[S] thus remains valid, unless explicitly mentioned in the present long 
write-up. 

In Section [2] we briefly review the modifications introduced in version (v2.49t) of the code 
HFODD. We distinguish between features implementing (i) new physics modeling capabilities, 
(ii) new numerical techniques and (iii) parallel computing methods. Section [3] lists all additional 
new input keywords and data values, introduced in version (v2.49t). In serial mode, the structure 
of the input data file remains the same as in the previous versions, see Section 3 of Ref. [2J . In 
parallel mode, two input files, with strictly enforced names, must be used: hfodd . d has the same 
keyword structure as all previous hfodd input files, with the restriction that not all keywords 
can be activated (see list in Sec. 13. 3.11) : hf odd_mpiio . d contains processor-dependent data, see 
SecEXl 

2 Modifications introduced in version (v2.49t) 

2.1 New Physics Features 
2.1.1 Isospin Mixing and Projection 

The concept of isospin symmetry, having its roots in the approximate charge independence 
of the nucleon-nucleon interaction, was already introduced in nuclear physics in the 1930s by 
Heisenberg and Wigner [HI |9]. Throughout the years, it has proven to be extremely powerful 
and not abated by the presence of the Coulomb force - the main source of the isospin symmetry 
violation in nuclei - simply because the isovector and isotensor parts of the Coulomb force are 
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much weaker than the dominant, isospin symmetry preserving components of the Coulomb and 
strong interactions. 

Apart from the exphcit violation of the isospin symmetry due to the strong and, pre- 
dominantly. Coulomb interactions, various approximate theoretical methods used in nuclear 
structure calculations are often the sources of unphysical violation of this symmetry by them- 
selves flU\ \TT\ [T2| IT3| [H]. This specifically concerns the Hartree-Fock and Kohn-Sham theories 
that employ independent-particle wave functions, which manifestly break the isospin symme- 
try in N ^ Z nuclei even for isospin-conserving interactions. The most prominent effects of 
this spontaneous isospin-symmetry-breaking occur, however, in the ground-state configuration 
of odd-odd N = Z nuclei and in T 7^ excited configurations of = Z nuclei, see Ref. p3] 
and references cited therein. Hence, practical implementation of the method requires the isospin 
projection and subsequent rediagonalization of the entire Hamiltonian in the isospin-projected 
basis. These two major building blocks of the isospin projection method will be described be- 
low. The discussion will be followed by a short presentation of the extended version of our 
model including the isospin and angular-momentum projections which is needed for specific 
applications including calculation of the isospin-symmetry breaking corrections to superallowed 
/3-decay [ia[ISl[I7]. 

The isospin projection: To remove the unphysical isospin-symmetry violation introduced 
by the mean-field (MF) approximation, the code hfodd (v2.49t) was equipped with a new 
tool allowing for the isospin projection after variation of an arbitrary symmetry-unrestricted 
Slater determinant 1$) provided by the code. The method implemented uses the standard 
one-dimensional isospin-projection operator Pt^t^'- 

1 - 2T -\- 1 

m) = ^==PItJ^) = d(3T sin (3Tdl^^{PT)RiMm, (1) 

which allows for decomposing the Slater determinant |$), 

|$)= bTTjTT,), (2) 

T>\T,\ 

into good-isospin basis \TTz). Here, /3t denotes the Euler angle associated with the rotation 
operator R{/3t) = e"*^"^"^" about the y-axis in the isospace, d^^rp^i^Px) is the Wigner function [TS] . 
and Tz = {N — Z)/2 is the third component of the total isospin T. The normalization factors 
Ntt^, or interchangeably the expansion coefficients Btt^, read: 

2T + 1 

Ntt. = \bTTf = {^\PlTj'^) = ^^J^d^Tsin{3dlTSMmM, (3) 

where 

xiPT) = {miPT)m (4) 

stands for the overlap kernel. 

The isospin projection operator is used to construct a subspace (basis) of good-isospin states 
\TTz). Its size is controlled by the parameter et, such that only the states \TTz) that have 
tangible contributions, I&tt^P > ^t, to the MF state are retained for further rediagonalization. 
In practice, Et = 10~^° sets the limit of T < jT^I + 5. The good-isospin basis created in this 
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way, although of rather small dimension, is believed to capture the right balance between the 
short-range strong interaction and the long-range Coulomb force (see discussion in Ref. |13]). 
The parameter Et = 10~^° also ensures that the two basic quantities reflecting the accuracy of 
the method, namely, the overlap (normalization) sum rule, 

Ei^^^^i' = i' (5) 

T>\T,\ 

and total MF energy sum rule, 

EMF = mm)= b*T,TbTTATX\H\TT,), (6) 

TT'>\T^\ 

are both fulfilled with extremely high accuracy. The latter property is due to the fact that 
the isospin-projection method is practically free from divergences plaguing particle-number and 
angular-momentum [121 [201 EB E2j methods. An analytical proof of this rather remarkable 
feature of the isospin projection is given in Ref. 

Rediagonalization in the isospin projected basis: Expansion coefficients 6t,t^ do not 
reflect the physical isospin mixing. Indeed, they are affected by the spurious isospin mixing, 
which is due to the spontaneous breaking of the isospin symmetry caused by the MF approx- 
imation. To calculate the true isospin mixing, one needs to rediagonalize the total nuclear 
Hamiltonian in the good-isospin basis |TT^). The present implementation of the code hfodd 
admits Hamiltonians that include the isoscalar part of the kinetic energy T, isospin-invariant 
Skyrme functional, , and Coulomb force, V"; the latter can further be decomposed into the 
isoscalar, V^, isovector, V^, and isotensor, V,^, components, that is. 



H = T + V'' + V'^ = f + V^ + V^Q+V^Q+V^Q, (7) 



where 



V^oir^,) = (rSA?-^r«ofO)). (10) 

Note, that the components are constructed by coupling the spherical components of the 
one-body isospin operator: 



1 
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^10 = ^2, fi±i = =F— (f^ ± ify) , (11) 



where fi, i = x,y, z denote Pauli matrices and symbol o stands for the scalar product of isovec- 
tors. Hence, from a mathematical viewpoint, they represent isoscalar, covariant rank-1 (isovec- 
tor), and covariant rank-2 axial (isotensor) spherical tensor components of the Coulomb inter- 
action, respectively. This mathematical property allows the use of Racah algebra in order to 
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calculate matrix elements of the Hamiltonian. The rather lengthy details concerning this specific 
theoretical aspect of our model are given in Ref. [H] and will not be repeated here. 

Rediagonalization of the total Hamiltonian in the good-isospin basis leads to the eigenstates: 

|n,T,)= ^ttJTT,), (12) 

T>\T,\ 

numbered by index n. Apart of the eigenenergies -En.T^ and the amplitudes a^T^ that define 
the degree of isospin mixing, the code also provides the so-called isospin (Coulomb) mixing 
coefficients or, equivalently, the isospin impurities. For the n-th eigenstate, the isospin impurity 
is defined as ag, = 1 — IctTTjmax' where IdTTjuin^ stands for the squared norm of the dominant 
amplitude in the wave function |n, Tj), and is given in percents. 

Evaluation of the isospin impurity ac is a prerequisite for determining the isospin-breaking 
corrections 6c to the 0^ — t- 0^ Fermi matrix element of the isospin raising/lowering operator 
T±. Of particular interest in nuclear physics are the Fermi matrix elements: 

|(/ = 0, T ^ 1, = ±l|f±|/ = 0, T ^ 1, T, = 0) p = 2(1 - 6c), (13) 

for a set of nuclei undergoing the super-allowed beta decay, because the 6c parameter is the key 
nuclear quantity needed for precise nuclear tests of the conserved-vector-current hypothesis and 
for the determination of the up-down matrix element in the Cabibbo-Kobayashi-Maskawa matrix 
(see Ref. [23] and references quoted therein). The calculation of the Fermi matrix elements f fT3|) 
was one of the primary motivations to couple the newly developed isospin projection with the 
existing angular-momentum projection [B]. Indeed, such a four- dimensional projection appears 
to be absolutely necessary to get reliable representation of decaying states in daughter (parent) 
\I = 0,T ^ l,Tz = ±1) and parent (daughter) \I = 0,T ^ l,Tz = 0) nuclei undergoing 
the super- allowed transition, respectively (see Ref. [15]). It should be stressed, however, that 
the range of applicability of the four- dimensional projection is by no means limited to the 
computation of matrix elements (fT3|) but also encompasses, in particular, various applications 
in high-spin physics in iV ~ Z nuclei. 

The four-dimensional isospin and angular momentum projection: The implemen- 
tation of the four- dimensional projection follows rather closely the angular-momentum projec- 
tion scheme adopted in version (v2.40h) of the code and described in detail in Ref. [6] (cf. 
Refs. [22| 121]. Hence, in the following, we will refrain from technicalities and concentrate on 
discussing the main building blocks of the method. The starting point is the good angular 
momentum / and good isospin T basis generated by acting with standard angular-momentum 
Pmk isospin Pj^^t^ projectors on the Slater determinant |$): 

\IMK-TT,) = Pl^Pi,^\^), (14) 

where M and K stand for the angular-momentum projections along the laboratory and intrinsic 
z-axes, respectively. The basis composed of states \IMK\ TT^) is over-complete. This problem is 
overcome by constructing, separately for each I and T, the so-called collective subspace spanned 
by the natural states: 

\IM-TT,)^'^^ = ^=Yt]^;^\iMK-TT,). (15) 
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The m}^ natural state is constructed by using the mixing amphtudes r]^^^ that correspond to 
the m*^ eigenstate of the norm matrix, see Eq. (9) in [6]: 

Y.^7i''^K^=n^n^K'- (16) 



K' 



Only the eigenstates corresponding to eigenvalues rim > C s-^'s taken into account, with the basis 
cut-off parameter ( introduced in Ref. [6]. In this way, for each value of the angular momentum / 
and isospin T, the collective subspace contains mniax(-^, T) states. The overlap matrix appearing 
in Eq. f|T6|) reads: 



= ^^^^iQ^I^'^^ I dhdl^rSM I dnD'*^,{n) ($|i?(/3T)i?(fi)|$), (17) 

where R{VL) = Q-^'^i^Q-^t^h^-i'iJ^ stands for the space-rotation operator, which depends on three 
Euler angles Q = {a,/3,j), and D^^,(f2) is the Wigner function. 

To simultaneously take into account the ii'-mixing and isospin mixing, the code performs, 
separately for each value of the angular momentum /, the full diagonalization of the total 
Hamiltonian ([7]) in the n(J)-dimensional, n{I) = | mmax(-^, 7"), collective space spanned 

by natural states (fT5l) . Such a diagonalization leads to the eigenstates of the form: 

^max 

\n;IM;T,)= J2 E aSWI^^; ^^.)^'"^ (18) 

T>|T2| m=l 

which are labeled by the conserved quantum numbers /, M, and = {N — Z)/2, and by the 
additional index n, which characterizes the K and isospin mixing. For the sake of complete- 
ness, it is worth mentioning that the code also provides the i^-mixed and isospin-conserving 
eigenstates, which result from the diagonalization of the total Hamiltonian with all isospin- 
symmetry-breaking (AT 7^ 0) matrix elements set to zero. 



2.1.2 Finite-temperature Formalism 

The equilibrium state of a physical system at constant temperature T and chemical potential A 
is obtained from the minimization of the grand canonical potential fl [2^ [2Sl EZl ESI 121] 

n = E -TS - XN, (19) 

where the energy [E), entropy (S) and particle-number (A^) are statistical averages and are 
given by 

E = Tt{VH), (20) 

S = -kTT{V\nV), (21) 

N = Tr{VN). (22) 
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The density operator V and the grand partition function Z are defined, respectively, as 

V = ie-'^^^-^^) , (23) 
Z 

Z = Tr [e-^(^-^^)] , (24) 

where /3 = 1/ kT and H is the two-body Hamiltonian. In the MF approximation, the two-body 
density operator in Eq. ( 1251) is replaced by a one-body operator. It has been demonstrated in 
[22] that the variation of the grand canonical potential with respect to the density operator P 
leads to HFB equations that are formally equivalent to the T = equations, namely 

where h and A are the Hartree-Fock and pairing potentials, and are obtained from the energy 
density functional as usual. The inclusion of finite temperature in the formalism is achieved 
by generalizing the expression of the density matrix and pairing tensor. In configuration space, 
they read [29] 

p = ufU^ + V*{l-f)V^, (26) 
K = ufV^ + V*il- f)U^, (27) 

where the quantity "/" stands for the Fermi function, defined as, 

•^^ = l+\^E, ' (28) 



with the /i quasi-particle energy. In coordinate space, their expression becomes 

p{ra,r'a') = Yl {UU^''\ra)U^^>{r'a') + {1 - U)V^''>{ra)V^'^\r'a')} , (29) 

ax 

K{ra,r'a') = ^ {f^U'^^\ra)V^^'^*{r'a') + {1 - f^)V'-^>{ra)U^''\r'a')} . (30) 

ax 

All additional densities (kinetic, spin-current, etc.) are derived from the particle density ( l29l) . 
In HFODD, the conventional pairing tensor is replaced by the pairing density p{ra, r'a') obtained 
according to: p{ra,r'a') = —2a'K{ra,r' — cr'), see [30] . 

The code hfodd implements the finite-temperature formalism in the HFB and HF-I-BCS 
modes. For the latter case, we refer to Sec. 5 of [25] for the details of the expressions coded. 
For the former case, we would like to emphasize that the calculation of the Fermi level A needs 
to be modified at T > 0. Let us recall that the adjustment of A in hfodd is based on BCS 
formula [30]: given the equivalent spectrum of single-particle (s.p.) states and pairing gaps 
A^, the particle number is computed according to: 
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with the Fermi functions of Eq. (p8l) and the occupation factors given by: 



RBCS 



u, 



(32) 



with E^^^ 



When applying the Newton- Raphson method to determine A by 

the condition that N = Nq, the imphcit dependence of the on A must be taken into account. 
This is done by updating the at each A according to: 



1 



1 + e''^?' 



(33) 



and introducing the corresponding additional term df^/dX in the derivative dN/dX. The con- 
tribution from the thermal occupation factors is crucial for the convergence in the unpaired 
regime. Note that in the case of zero-range pairing interactions, there can be stability issues 
for low cut-offs near the phase transition. Recently, the finite temperature extension of hfodd 
has been used in a systematic study of fission paths and barriers of actinide and superheavy 
elements 



2.1.3 Lipkin Translational Energy Correction 



According to the Lipkin method [551151] . the linear-momentum projected energy of a system at 
rest can be calculated without the actual projection as: 

E$(0) = ($ I H-K I $), (34) 

where H is the two-body effective Hamiltonian and 

K = kP^ (35) 

is the Lipkin operator in the quadratic approximation, P = XliLi P« the total linear momentum 
operator and is a parameter to be determined. The optimum state is found by minimizing 
the right hand side of Eq. fl34|) . Note that the projected energy depends parametrically on 
the parameter k, that is, there is no variation with respect to k (no contribution to the HF 
potentials) . 

To determine the value of k, we proceed as follows j55| 15^. First define at each iteration the 
translated wave-function |$(R)) as: 

|$(R)) =e^^-^|$). (36) 
Then one can show that the correcting Lipkin parameter k reads: 

MRW^_ (37) 

where: 

, , , (<^' I H I $(R)) , , ($ I P2 I $(R)) ^ ^ 

MR) = l^L^J , P2(R) = i, ' \/S (38) 
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Figure 1: Lipkin projected energies (IMll relative 
to the standard SLY4 energies, calculated for the 
chain of lead isotopes. The results for the exact 
masses (Lipkin operator fl35|) with k = ko of Eq. 
( 139|) ) are given with the center-of-mass correc- 
tion added after (diamonds) and before (circles) 
variation. Similarly, for the renormalized masses 
{k of Eq. ( !37|) ). the results for closed sub-shells 
are given with the correction added after (trian- 
gles) and before (squares) variation. 



Figure 2: Ratios ko/k = M/mA of the 
renormalized and exact masses determined 
after (triangles) and before (squares) varia- 
tion. 



are the energy and momentum kernels. The calculation of k at each iteration is therefore 
straightforward, since it only requires to compute h{Il) and P2(R-) for a single (arbitrary) value 
of the shift vector R. The parameter k plays the role of a renormalized mass. It can be compared 
to the traditional so-called 1-body center-of-mass correction factor: 

where m is the nucleon mass. Since for the translational-symmetry restoration the Gaussian 
Overlap Approximation (GOA) is excellent [31], one can also approximate the Lipkin parameter 
by the GOA expression [35l [34] : 

41og'(($ I <I>(R))) 

where one assumes that h(Il) and log(($ | ^(R))) can be at the shift of R approximated by a 
parabola. The GOA expression is much faster to evaluate, because it does not require calculating 
kernels of the two-body operator P^. 

Figures [T]-[2] illustrate various aspects of the Lipkin method for linear momentum projection. 
In the captions of the figures, the term 'exact mass' refer to the quantity k^, and the term 
'renormalized mass' to the quantity k. 
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2.1.4 Shell Correction 



The shell-correction method relies on the Strutinsky energy theorem, which states that the total 
energy E of the nucleus reads: 

E = -Ebulk + (^-Rshell, (41) 

where -Ebuik varies slowly with proton and neutron numbers, and 5-Rsheii is a rapidly varying 
function of Z and N that captures the quantum corrections to the liquid drop [251 EZ] • It was 
demonstrated in [SB] that such a simple decomposition remains valid when the total energy E is 
computed microscopically as the integral of some energy density functional or expectation value 
of a two-body effective Hamiltonian at the Hartree-Fock approximation. 

The shell correction must be computed from a set of s.p. levels {e,}, which in hfodd are the 
Hartree-Fock s.p. energies. In its traditional form, it is given by: 

je{occ} \ i I smooth 

where i G {occ} represents a set of occupied states (for the HF vacuum, this is the set of 
the lowest Z or N levels), and the bracket (. . . )smooth represents the Strutinsky smoothing 
procedure. For the latter, we follow the prescription presented in [39] and applied on a large 
scale in macroscopic-microscopic calculations, e.g., in [lO] . 

The smoothed energy in expression (H2!) contains a spurious contribution from positive energy 
states Cj > which can become problematic when approaching the dripline. This spurious term 
can be removed by subtracting to Eq. fH21) the smooth energy obtained for an independent gas 
of particles [HI HI]. This leads to slightly different prescription for the shell correction: 

je{occ} I \ i / smooth \ * / smooth 

where the ti are the eigenvalues of the kinetic energy operator. The shell correction is computed 
twice, for protons and neutrons. For protons, the Coulomb potential must also be taken into 
account by doing the substitution ti — )■ (t-l-Vcou)j- The shell correction fH51) is of course evaluated 
at the convergence of the self-consistent HF calculation. Both estimates ^-Rghiu and ^-Rgheii 
shell correction are available in hfodd and are triggered by the value of the input parameter 
IFSHEL. 

2.2 New Numerical Features 

2.2.1 Two-basis Method for HFB Calculations 

The two-basis method was devised in Ref. [l3j to solve the HFB equations in spatial coordinates. 
The method allows for decoupling the particle-hole and particle-particle channels from one 
another and using the same technology as that developed for the HF+BCS method, even for the 
complete HFB problem. The essence of the method relies on the solution of the HFB equations in 
the basis of eigenstates of the particle-hole MF operator h. In spatial coordinates, this operator 
is not diagonalized in every iteration, but a set of eigenfunctions is evolved in imaginary time. 
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and thus they converge to eigenstates only at the end of the iterative process. In our case, the 
self-consistent equations are solved by using the HO basis and the imaginary-time evolution is 
not used; therefore, we implement the two-basis method by explicitly diagonalizing h. 
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This procedure has two advantages over the standard HFB method. First, the cutoff of the 
configuration space can now be performed in the s.p. space and not in the quasiparticle space. 
Therefore, the dimension of the HFB equations, reduced to s.p. states with energies e below the 
cutoff energy, e < Cmax, is much smaller than the full HO space. This speeds up the calculations. 
Second, HFB calculations in a reduced s.p. space do not suffer from formal drawbacks related 
to the cutoff in the quasiparticle space, see discussion in Ref. [H]. 

Figure [3] shows the CPU times required to perform the HFB calculations for ^^°Sn by using 
the cutoff energy of i^cut = 60 MeV, Skyrme functional SLY4, and pairing-force parameters of 
Eq. (14) in Ref. [4], Vq = -285.88 MeV, po = 0.32 fm'^, and a = 1. The results were obtained 
for bases of A^o = 14-20 and for four conserved-symmetry conditions, as indicated in the Figure. 
The CPU times scale as N^. It turns out that in the case of calculations performed without 
any conserved symmetry, the two-basis method can be up to 30% faster than the standard 
HFB method. However, almost half of this gain can be obtained by simply requesting in the 
standard HFB method that the HFB wave functions be calculated only below the quasiparticle 
cutoff energy (see keyword CUT_SPECTR). In view of this limited gain in the CPU time, in version 
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(v2.49t) the two-basis method is not implemented in the remaining three conserved-symmetry 
conditions. 

The two-basis method gives results that are close, but not identical, to those given by the 
standard HFB method. This is illustrated in Figure HI where the total energies (upper panel) 
can differ up to 200 keV, the pairing energies (middle panel) up to 1.5 MeV, and the neutron 
pairing gaps (bottom panel) up to 70 keV. Although these differences are non-negligible, they are 
probably inferior to other uncertainties of the approach, and at present the physical advantages 
or disadvantages of one method over the other cannot be established. One should stress that 
the two methods of implementing the cutoff give exactly the same numbers of quasiparticles, so 
the above difference are not caused by different sizes of the model spaces. 

2.2.2 Augmented Lagrangian Method 

Multi-constrained EDF calculations are a crucial ingredient of a number of nuclear structure ap- 
plications. The microscopic description of high-spin states is based on the cranking model, which 
requires a constraint on the value of the total angular momentum. Modeling the fission process 
involves the calculation of mult i- dimensional potential energy surfaces, where the constraints are 
imposed on expectation values of the multipole moments. Most generally, beyond-mean-field 
applications, or multi-reference EDF, rely on a basis of constrained MF states used to generate 
collective motion. 

In previous versions of hfodd, constraints on the nuclear shape took the standard quadratic 
form, see Eq. (22) in ^j. This so-called quadratic penalty method was chosen to avoid the 
pitfalls of the method of Lagrange multipliers (linear constraints), which often fails to converge. 
The quadratic penalty method is usually very fast and robust, but does not always yield the 
desired solution: the expectation value of the multipole moments at convergence may differ, 
sometimes significantly, from the requested values. In addition, it depends rather sensitively on 
the stiffness parameter, which controls the magnitude of the corrective term. 

The Augmented Lagrangian Method (ALM) provides a valuable alternative for multi-const- 
rained calculations [151 H5] . It is a general algorithm which aims at minimizing a scalar function 
S{x) of the vector x under a set of constraints gi{x) = g° (the so-called finite-dimensional, 
equality-constrained nonlinear optimization problem). In practice, it can simply be viewed as 
a smart combination of both the linear and quadratic penalty methods. Adopting the same 
notations as in Sec. 2.3 of [T], the total energy takes the form: 

S' = S - ^L,^((Q,^) - Q,^) + J2C^,{{Qx,) - Qx,y^ (44) 

where Lx^ is the Lagrange parameter for the multipole (A,/i), Cxn is the corresponding stiffness 
and the requested value for the multipole moment Qx/i- Note that the minus sign for the 
linear constraint term is a matter of convention. While the stiffness is an input parameter that 
remains constant along the iterations, Lx^ has to be re- adjusted. At iteration m + 1, the new 
Lagrange parameter reads: 

4r'^ = 4? - 2C.,((g.,)(™) - Qx,). (45) 

For a EDF solver already implementing the quadratic penalty method, adding the ALM is 
extremely simple: (i) The linear term in Eq. fH4l) must be added to the total energy, (ii) the 
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matrix elements of the corresponding HF potential [/^"^^ = — ^x^i ^\pQ>^tJ- i^ust be added to the 
HF(B) matrix, and (iii) the Lagrange parameter must be updated at every iteration according 
to Eq. ( H5|) . The method induces almost no computational overhead, is very robust, and always 
gives very precisely the requested solution, see [IT] . 

2.2.3 Linear Constraints Based on the RPA Matrix 

The ALM method does not make any specific hypotheses as to how the function S{x) is com- 
puted. Within the nuclear EDF, the function S{x) is the total energy of the nucleus, and is itself 
obtained as the solution to a variational problem. One may therefore take advantage of this 
additional information to adapt the standard optimization algorithm with linear constraints. 
Such an approach was proposed already 30 years ago in the context of the self-consistent nu- 
clear MF theory with finite range effective forces |18]. At every iteration, an estimate of the 
QRPA matrix at the cranking approximation is computed. This information is used to make 
an educated update of the Lagrange parameters Lx^ of the linear constraints. A detailed and 
pedagogical presentation of the algorithm and its applications for fission barriers calculations 
can be found in |19]. 

The implementation of the method in hfodd follows very closely the Appendix A of [19], 
and we refer to this work for further information. Let us note that this method requires the 
matrix of the constraint operator in the q.p. basis, which involves 8 matrix multiplications per 
iteration (4 for neutrons, 4 for protons). The computation of the constraints correlation matrix 
also requires additional matrix multiplications per iteration, where Ac is the number of 
constraints. When simplex symmetry is conserved, the properties of the basis in hfodd reduce 
the size of the matrices involved in all these operations to one half of the total basis size at 
most. The computation overhead can still be noticeable, but is always largely compensated by a 
drastic reduction in the number of iterations necessary to reach convergence and the near-perfect 
precision of the obtained solution. All major linear algebra operations (matrix multiplication 
and inversion) are carried out by BLAS and LAPACK routines. 

2.2.4 Interface with hfbtho 

The code hfbtho solves the Skyrme HFB equations in the HO basis by assuming axial and 
refiection symmetry ^U\. These built-in symmetries make the HFB matrix block-diagonal, and 
the typical run time of the program is at least an order of magnitude shorter than for HFODD. 
This makes hfbtho an ideal tool for large-scale calculations in cases where axial- and refiection 
symmetries are sensible assumptions [51]. Conversely, hfbtho is not appropriate for specific 
problems such as the description of nuclear fission, where the complexity of the nuclear shapes 
impose the use of a fully symmetry-unrestricted solver like hfodd. 

Both codes have been carefully benchmarked against one another in even-even [^2] and odd 
nuclei [53] at the equal-filling approximation. The difference of total energies in a nucleus like 
^^°Sn is typically of the order of 10 eV, and can entirely be attributed to the different techniques 
of computing the Coulomb potential. Such a nearly perfect match makes it possible to accelerate 
hfodd run time by coupling the two codes together: for a given nucleus, the calculation is first 
carried out by hfbtho (assuming axial and reflection symmetry), then restarted with hfodd 
after a simple unitary transformation. If the physical solution is axial and parity invariant, 
the HFBTHO solution is the correct one, and hfodd can stop after the basis transformation. 
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If the solution breaks any of these symmetries, the self-consistent procedure will continue until 
convergence. The motivation for such an interface is the observation that, for nearly all nuclear 
shapes, the driving deformation is the axial quadrupole moment Q2q- 

Let us denote by {|SIM„)} = {|na;, n^, n^; s = ±«)} the simplex-conserving Cartesian Har- 
monic Oscillator basis used in hfodd. We denote by {|CYL„)} the cylindrical harmonic oscil- 
lator basis used in hfbtho, |CYL„) = |A^, rip, n^. A, fi). The basis transformation {|CYL„)} — > 
{|SIM„)} proceeds in two steps: 

1. The coordinate transformation {|CYL„)} — ?► {|CAR„)} is carried out, where the {|CAR„)} 
= \nxiny,nz:,(j) are the Cartesian harmonic oscillator states and a = ±1/2 is the z- 
projection of the intrinsic spin; 

2. A unitary phase transformation is then performed to go to the good y-simplex basis: 
{|CAR„,)}^{|SIM„)}. 

Coordinate transformation - The spatial quantum numbers in Cartesian and cylindrical 
coordinates are related through: 

N = Tlx + riy + riz = 2np + A + n^. (46) 

Let us note n± = N — riz- In principle: 

< ?T-p < n±, and — n± < A < +n±. (47) 

However, quantum numbers A < (therefore Up > correspond to states which are the 

time-reversed partners of the states A > 0: In hfbtho, time-reversal symmetry is conserved, 
all basis states with A < are disregarded, and the HFB matrix "H is explicitly block-diagonal 
by f2 = A ± cr values. The full HFB matrix is therefore reconstructed from each f2-block. By 
a suitable reordering of indexes, it is then split into 4 matrices 7^'-'^°" \ The "H*^"""" ^ are purely 
spatial matrices with matrix elements of the type: 

(n;,A'<|-H(""')|npAn,). (48) 

The transformation of the matrix elements fl48p from cylindrical to Cartesian coordinates 
requires the overlaps: {nxnyUzlnpAnz) . We use the formulas given in |5l]. Let us recall: 

{nxnynz\npAnz) = 62np+A,n,+ny{-^) ""{-1) 



22np+A 



Pi 



max 



with: 



and: 



X y tl^^^ , (49) 

^ p\{n^-p)\{p + np-nx)\{np + A-p)\' 



P^.n ={' f'""^^ (50) 

^ Tlx - Up for Up <nx, 



_ , forn^. <np + A, 

^ Hp + A forn, >np + A. ^ ' 
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Figure 5: Stability of the HFODD solution after restart from HFBTHO at the spherical point in 
^^^Dy, as function of the deformation of the HO basis. 



The overlaps for A < are obtained from those with A > according to: 

{n^nyn^\npK(<Q)n^) = (-l)'"^' (n^nj,n;,|rapA(>o)n^)*. (52) 

Phase transformation - After the matrices are obtained in Cartesian coordinates, 

the phase transformation, Eqs. (78a)-(78b) of Ref. [1], is performed. Let us recall that it reads: 



n^UyU^, s = +i) 



1 
1 

7! 



\n^nyn^,a = -) - i 



''\n^nynz,a 



(53) 



where s = ±i is the y-simplex. In hfbtho, simplex and time-reversal symmetries are always 
conserved (by construction), so that the HFB matrix is block diagonal: 'H'-*^ ^ = Sss/'H^^^ and 
^(-s) _ '^(■5)* 'j]-^g blocks %'^^=^'^) are obtained by linear combinations of the 'H^"'^^ and the 
phase factors recalled in Eq.( l53|) . 

For axially- and parity-symmetric HFB solutions, the interface between hfbtho and HFODD 
gives a precision at restart that ranges from 1 eV to 1 keV depending on the nucleus, the 
characteristics of the basis and the quadrupole deformation of the solution. Almost all the 
error is contained in the direct Coulomb energy which is computed differently in the two codes. 
Figure shows the stability of the HFODD iterations immediately after restart from the hfbtho 
solution at the spherical point in ^^^Dy, for different deformations of the basis and different basis 
sizes. 
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2.2.5 Mixing of Matrix Elements of the HFB Matrix 



When solved by successive diagonalizations, as in hfodd, the Hartree-Fock equations are self- 
consistent. In practice, the iterative scheme is started with a set of initial conditions, formally 
some vector V^^\ that linearize the problem. Upon entering iteration m with a vector V^n™"*, 
solving the HP equations yields a new vector v}^^\ This vector is then used as input to the 
next iteration m + 1, VJ^™'*'^'' — )■ V}^K The iterations stop when jV^^^'*'^'' — V}^^\ < with e a 
measure of the convergence. In practice, the input vector at iteration m + 1 must be a mixing 
of Vt^ a 
the type: 



of V^n'"^ ^out'^ for the iterations to converge. This mixing can be a simple linear mixing of 



Vt^'^ = + (1 - a)vt^ (54) 

or more elaborate such as produced by the Broyden method [55]. 

Both the linear and Broyden mixing are implemented in hfodd. By default, the iterated 
quantities V are the values of the HF fields on the Gauss- Hermite integration mesh f6] . However, 
it was noticed that when the Lipkin-Nogami (LN) prescription is used, the matrix elements of 
the density matrix in the HO basis must also be added in order to ensure convergence. In the 
simplest case where time-reversal and simplex symmetries are conserved, and the LN procedure 
is applied to both protons and neutrons, the size of the Broyden vector is augmented by 4M^, 
where M is the size of the s.p. basis. For large bases, the size of the Broyden vector can thus 
become prohibitive. To by-pass this memory bottleneck, the mixing of the matrix elements of 
the HFB matrix has been implemented. 

In HFODD, the self-consistent loop is organized in such a way that at each iteration m, it 
is initialized with the set of HF fields (for neutrons and protons) V^l^\ and ends with the 
determination of the new fields v}^^ : it therefore lends itself naturally to mixing the HF fields. 
By contrast, the mixing of the matrix elements of the HFB matrix is most easily performed when 
the self-consistent loop starts with an initial HFB matrix H^^^ and ends with the computation 
of the new HFB matrix Hj^^^ (this is the case, e.g., in hfbtho). In order to conserve the 
'HF potential-oriented' structure of the self-consistent loop of hfodd, the mixing of the matrix 
elements of the HFB matrix must be done immediately after a new H^^^^ has been calculated: 
in HFODD such a condition actually requires independent mixing for protons and neutrons with 
two separate calls to the mixing routine and, in the case of the Broyden mixing, two different 
memory arrays. 

The size of the Broyden vector (for one isospin only) depends on the symmetries of the 
problem: simplex conserved ISIMPY=1 or broken ISIMPY=0, time-reversal symmetry conserved 
IROTAT=0 or broken IR0TAT=1, HFB calculations IPAHFB=1 or HF calculations IPAHFB=0. Taking 
also into account the symmetries of the matrix of the mean field and pairing field, the size of 
the Broyden vector is: 

N = (1+IROTAT) X + (1-ISIMPY) x 

+ IPAHFB [M(M + l)/2 + IROTAT x M(M - l)/2] (55) 

For a typical static HFB calculation with conserved time-reversal and simplex symmetry, a 
stretched basis such that NXHERM=NYHERM=30, NXHERM=60 and M = 1000, and 7 iterations 
conserved in memory, the Broyden method requires 302 MB of RAM when fields are mixed, and 
about 168 MB when matrix elements are mixed. 
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Figure 6: Convergence of the Cranking HF iterations in ^^^Tb with the Broyden method. Curves 
labeled 'Matrix' correspond to the original mixing of matrix elements; curves marked 'Noise' to 
the mixing of matrix elements when the Broyden memory is erased every 4 iterations; curves 
marked 'Fields' correspond to the mixing of HF fields on the Gauss-Hermite mesh. 



It has been observed that the mixing of the matrix elements of the HFB matrix in HFODD 
is less stable than in a comparable implementation in hfbtho. This numerical noise may be 
due to the fact that hfodd breaks a number of symmetries that are conserved in hfbtho, 
and which manifest themselves by numerically small, non-zero elements in the matrix. Correct 
performance for ground-state calculations can still be obtained if the memory of the Broyden 
method is erased every n iterations, with n < n^era where n^^ra is the number of iterations 
retained to compute the full Broyden correction (noise cancellation). In practice n^^ra = 4 gives 
decent results. 



2.3 Parallel Programming Model 

Starting in version (v2.49t), the code hfodd has built-in parallel capabilities. These capabilities 
are controlled by 3 different pre-processor options and are discussed below. 

2.3.1 Distributed Memory Parallelism 

Density functional theory is an efficient method to compute the properties of multi-fermionic 
systems. From a programming point of view, recasting all degrees of freedom of the problem 
into a single function of r, the local one-body density matrix, allows the formulation of a 
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compact, CPU- and memory-thrifty, implementation. In practice, the average computation 
time of standard nuclear EDF solvers ranges from a few seconds for a spherical closed-shell 
nucleus up to a few hours for a full symmetry-breaking configuration in a heavy nucleus. 

Such naive estimates, however, are deceptive: in many instances, the nature of the problem at 
hand requires the computation of many such configurations, as for example in the determination 
of Potential Energy Surfaces (PES), which are critical ingredients in the proper description of 
the nuclear fission process. While the time of calculation of any point of the N-dimensional PES 
may be of the order of a few hours, systematic and accurate mapping of the surface is required 
to compute reliable estimates of barrier heights, tunneling probabilities or collective inertia. For 
N = 5 degrees of freedom with Ui = 10 sample points each, the size of the mesh is 100,000: such 
a problem requires both supercomputers and an adapted programming model. 

One giant simplification of high-performance computing applications with EDF methods is 
that the theory generates by construction a naturally parallel computational problem: most 
of the time, all configurations can be handled independently by a single core (CPU unit) of a 
multi-core processor. The amount of inter-processor communication is therefore often rather 
low (coarse granularity). Such a property has made it possible to compute the entire mass table 
in less than a day \5T\ . 

The distributed memory programming model of HFODD contains two layers of parallelism 
managed by the standard Message Passing Interface (MPI) library. The outermost layer cor- 
responds to A'master master groups of cores, each group being in charge of computing a given 
nuclear configuration. The innermost or group layer is made of the A^proc cores in any given 
group. The division of the processor grid in these two layers is carried out at the very beginning 
of the code using standard MPI group and communicator routines. Most applications of hfodd 
do not require the group structure, that is, A^'proc = 1 is sufficient most of the time. Examples 
of distributed HFB calculations are discussed in Sec. 12.3.41 

From a user's point of view, running HFODD on several cores requires the following: 

• The code must be compiled by setting the pre-processor option USE_MPI=1. The user is 
in charge of ensuring that an implementation of MPI exists on his/her system. 

• Input data now falls into 2 categories: process-dependent and process-independent data, 
the word 'process' referring to a given HFB calculation. Process-independent data is 
everything that will be the same on every process. Contrariwise, process-dependent data 
is what changes from one process to the next: it is therefore what distinguishes the nuclear 
configurations [Z, N, constraints, etc.). For practical reasons, see Section [2.3.2^ process- 
independent data will always be contained in the file hf odd.d, and the process-dependent 
data always in the file hf odd_mpiio .d. 

2.3.2 Scalable Input Routines 

In its single core version, hfodd reads its input data from the system standard input. In 
practice, data is often contained in a simple ASCII text file, and the execution of the code uses 
input redirection. Input is read by the routine NAMELI, which loops over the file for specific 
keywords, each keyword being immediately followed by the relevant data. Such a structure 
provides good fiexibility, since the user can add or remove keywords at will from the input file. 
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In parallel mode, duplicating this structure of the I/O operations on all available cores is 
not efficient, and can in fact affect the stability of the entire system. Indeed, not only would all 
cores try to access the disks more or less at the same time, but they would all access the same 
file and seek different positions in it. It is a well-known rule of thumb that parallel I/O has to 
be handled either by one core only, or using dedicated libraries. In the specific case of hfodd, 
the amount of input data is rather small, common to and needed by all cores, and read only at 
the beginning of the calculation. The most natural solution is therefore to assign one core to 
the task of reading it and broadcasting it to the others. 

However, the fiexibility induced by the keyword structure of the input file now becomes a 
disadvantage, since the amount of data to be read (and then broadcast) is in fact known only at 
run time. This feature suggests the use of Fortran 90 linked lists for each basic type of input data 
(integer, double precision real number and character string). The entire I/O operation is then 
broken down in 3 phases: (i) construction of the linked list (ii) broadcast and (iii) reconstruction 
of local data. In the first phase, the routine mpi_getSequentialData() parses the file hf odd.d, 
which has exactly the same structure as the standard hfodd input file, and for each data, 
adds a node to the relevant linked list. At the end of the process, linked lists are copied to local 
allocatable arrays which are then broadcast to all cores using the MPI routine MPI_Bcast (second 
phase). All cores then acquire a local (to a given core) copy of 3 arrays, for the 3 basic types of 
data mentioned above. Finally, every element of these arrays is re-associated with the relevant 
local HFODD variables (third phase). This is done by the routine mpi_setSequentialData. 



500 



450 



400 



2^ 350 



300 



250 



200L 
10 



•-• Average 
Minimum 
▼ ■▼ Maximum 




10^ 10^ 10' 

Number of cores 



10= 



30 



<u 20- 



-D 10- 



10" 



10= 10' 

Number of cores 



10= 



Figure 7: Average, minimum and maximum 
CPU time for 6 HFB iterations in ^^^Dy as a 
function of the number of cores for a full spher- 
ical HO basis of A^^sheii = 14 shells. The pairing 
cut-off energy is i^cut = 60 MeV. Calculations 
were done on the Cray XT5 at NCCS, the code 
was compiled with the PCI v. 10.3 compiler 
with -fast -Mipa=fast options. The Cray 
libsci library was used for BLAS and LA- 
PACK. All cores do the same HFB calculation. 



Figure 8: Load imbalance for the same runs as 
in Fig. [71 Load imbalance is defined here as 

(-^max -^min) /-^avD whcrC Tmin (rCSp. Tm^x) is 

the minimum (resp. maximum) time of execu- 
tion over all cores, and Tavr the average time 
over all cores. 



In principle, it would have been enough to implement this scheme for the standard hfodd 
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input file, and add a few keywords relevant for process-dependent data. However, it proved more 
convenient to put all process-dependent data in a file of its own, with a similar keyword structure. 
The I/O process described above has therefore to be repeated for the process-dependent data. 
This is carried out by the routines mpi_getParallelData() and mpi_setParallelData. 

This implementation of the I/O procedure combines a number of advantages. First and 
paramount, it scales well with the number of cores available, since only one core is dedicated to 
accessing the disk and reading the data. Since the amount of data will always be very small (a 
few kB at most), the broadcast phase should not induce a very large communication overhead. 
In addition, the linked list structure conserves the flexibility to add/remove data from the input 
files, which also ensures backward and forward compatibility with all future releases of the code. 

Figure [7] shows the scaling properties of hfodd. The sample calculation consisted of 6 HFB 
iterations in ^^^Dy in a full spherical basis of = 14 shells with constraints on Q20 = 20 b 
and Q22 = b. All cores computed the same configuration. The PGI compiler vlO.3 with the 
-fast -Mipa=f ast options was used to compile the code, the Cray library libsci to link to 
BLAS and LAPACK, and all calculations were done on the Jaguar Cray XT5 at the NCCS. 
These technical characteristics are given because the actual time of execution can vary very 
significantly depending on the compiler/platform and compiler options used. 

Since all cores perform exactly the same calculation, any departure from a flat straight line 
should be attributed at first order to (i) the inter-core communication during the initial input 
setup (ii) filesystem operations to create/access files. While the input setup has been somewhat 
optimized, see above, all other I/O operations are the same as in serial mode: for the runs shown 
in Figs. (I3IH]), every core generates 2 files that remain opened with read/write permissions for 
the entire time of execution. To better grasp the impact of the lack of I/O optimization, figure 
IH] shows the load-imbalance of this calculation, defined here as: 

LI = ^"^'^^ ~ ^'"'^ (56) 

where Tmin (resp. Tmax) is the minimum (resp. maximum) time of execution over all cores, and 
Tavr the average time over all cores. Perfect load-balancing (equal distribution of work between 
cores) implies LI=0. 



2.3.3 Shared Memory Parallelism 

Many leadership class computers have adopted a hybrid distributed-shared memory architecture, 
whereby all cores are grouped into processors, each processor having access to its own physical 
memory. Tests of the current version of hfodd have been carried out on the Franklin Cray XT4 
and Jaguar Cray XT5, which are characterized by, respectively, 4-core processors with 8 GB 
shared memory and 12-core processors with 24 GB shared memory. As mentioned earlier, most 
applications of hfodd can run on a single core, so that in the case of the Jaguar supercomputer, 
12 simultaneous calculations can be run on a processor. However, for large basis size, the memory 
needed by one instance of hfodd can exceed the 2 GB/core available. 

This seemingly limitation of the prevailing architectures can be exploited to accelerate the 
execution of the code by using the standard OpenMP API. If the number of instances of the code 
running on a given processor is less than the actual number of cores in that processor, several 
cores are in fact inactive. The OpenMP API offers a very simple interface to recycle these cores 
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Figure 9: Multi-threading accelerations in 
HFB calculations of ^'^'^Hg for a deformed 
-^sheii = 26 HO basis with 1140 states and 
deformation 0^20 = —0.32. Calculations were 
performed on a Intel Xeon processor with 
the Intel Fortran Compiler and the -xSSE4 . 2 
-03 -override-limits options with stan- 
dard BLAS libraries, and are compared to 
Amdahl's law. 



Figure 10: Same as Fig. |9]for 6 HFB iterations 
in ^^^Dy in a full spherical basis of = 14 
shells (same characteristics as in Fig. [7]) and 
threaded BLAS libraries. Black circles: speed- 
up of the entire code; triangles: speed-up for 
the 3 impacted routines (error bars reflect the 
rounding of the time to the nearest second). 
Dashed line: perfect scaling for the OpenMP 
acceleration. 



for quasi-automatic parallelization of the code (multi-threading). The execution then consists 
of series of sequential instructions (on one core) combined with multi-core parallel sequences. 
This mechanism is particularly effective to (quasi-) automatically parallelize loops. Let us recall 
that OpenMP instructions, or pragmas, are coded in the form of comments only activated by 
the relevant option of the compiler: modifications of the source code remain minimal. 

We identified 3 subroutines of hfodd that could take a significant chunk of the total run 
time: subroutine DENMAC computes the density matrix from the HFB eigenvectors; subroutine 
SPAVER computes s.p. expectation values of operators; subroutine nilabs defines the Nilsson 
labels of s.p. states. All these routines involve 3-nested loops with Astates elements, where 
A'states is the number of basis states. For large bases with A'states ~ 1000 — 2000, these loops 
become time-consuming, and they cannot be re-arranged easily in a way that memory access is 
optimized. 

Figure [3 shows the OpenMP acceleration of a full HFB calculation in ^^°Hg with a large basis 
of 26 HO shells and 1140 states (basis deformation 0:20 = —0.32) as function of the number of 
threads. Calculations were performed on a cluster of Intel Xeon processors with the Intel Fortran 
Compiler and the -xSSE4.2 -03 -override-limits options. Standard (un-threaded) BLAS 
and LAPACK libraries were used. The 3 routines impacted by Open MP pragmas represent a 
small fraction P = 0.37 of the total execution time in this case, and the observed acceleration 
nicely aligns with predictions by the empirical Amdahl's model. 

Leadership class computers often provide threaded BLAS libraries. In Fig. [TOl we shows 
the acceleration induced by OpenMP in the same test case as in Sec. I2.3.2[ This profiling 
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experiment was done on the Jaguar Cray XT5 computer at the NCCS by hnking to threaded 
BLAS hbraries. At the level of the 3 modified routines, the scaling is perfect with the number 
of threads; overall acceleration is a little better than in the case of Fig. IH] due to the benefit of 
using threaded libraries. OpenMP acceleration is activated by setting USE_OPENMP to 1 in the 
Makefile. 



2.3.4 Parallelization of Diagonalization Routines 

One strength of the hfodd code is its ability to perform computations without assuming sym- 
metries. Calculations without symmetries, however, prove computationally expensive. Until 
now HFODD has been successfully used in several massively parallel applications, such as the 
survey of one quasi-p article states in odd mass nuclei [53] and potential energy surfaces for fis- 
sion [56]. The trend in massively parallel computing, however, is to reduce the memory available 
to each computing core, thereby indirectly imposing restrictions on the symmetries assumed in 
current-generation HFODD calculations. In Fig. [TTl the peak memory used in a hfodd run is 
plotted against the number of full shells in the harmonic oscillator basis. It is worth noting that 
the present NCCS Cray XT4 and Cray XT5 machines have 2GB available to a core, which is 
exceeded by a problem using 20 full oscillator shells. For problems like fission that require the 
calculation of highly deformed nuclei, at least 26 — 31 oscillator shells can be needed. 
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Figure 11: The memory high- water mark at- 
tained by HFODD is plotted against the num- 
ber of major harmonic oscillator shells used in 
the basis. 



Figure 12: Total execution time of hfodd 
spent for different diagonalization routines 
and numbers of rows used in a square ScaLA- 
PACK process grid. For single-core tests, 
HFBISZ was modified so that all eigenstates 
of the HFB matrix are computed, to allow 
meaningful comparisons with multi-core algo- 
rithms. 



To fully take advantage of the next generation of massively parallel platforms as well as 
its full symmetry-breaking capabilities, hfodd should therefore be scaled so that its solution 
of the HFB equations is distributed among an arbitrary number of cores. Before undertaking 
such a daunting task, though, the anticipated benefits must be assessed and demonstrated. In 



24 



this release of the code, we have tested whether we can achieve any gain in the speed of the 
diagonahzation routines by using the ScaLAPACK hbrary. 

Tests were conducted for the all symmetry-breaking HFB case (subroutine HFBSIZ), which 
involves the largest matrices, of size AN x 4A^ if N is the number of states in the HO basis. 
Fig. [12] shows the time of execution of 3 HFB iterations in ^^^Dy in a full spherical HO basis 
of Ai'sheii = 14 shells (A^ = 680 states). The code was compiled with the -fast -Mipa=fast 
option and run on the Cray XT5 computer at the NCOS. OpenMP acceleration was activated, 
and the BLAS and LAPACK routines were provided by the libsci library. Fig. [T^ shows the 
benchmark results for different diagonahzation methods and different ScaLAPACK core grids. 
Let us note that the ScaLAPACK library does not include a parallelized version of all LAPACK 
diagonahzation routines. From the strict perspective of execution time, the single-core version of 
HFODD invoking the ZHEEVR LAPACK routine (based on a very fast diagonahzation algorithm) 
may turn out to be faster than the multi-core version, which can only resort to PZHEEV or 
PZHEEVD ScaLAPACK methods. 

The module hf odd_sl provides an interface to the parallel matrix diagonahzation routines 
available in ScaLAPACK. The user may enable the use of the hf odd_sl module by setting the 
environment variable USE_SCALAPACK to 1 in the project Makefile (similarly, USE_SCALAPACK is 
set to to disable the module). The hf odd_sl module requires that the program is linked to a 
ScaLAPACK library and compiled with MPI, as well. The user should also specify the size of 
the process grid in the project Makefile through the environment variables M_GRID (number of 
process rows) and N_GRID (number of process columns). The program will stop with an error, 
if the product M_GRID x N_GRID times the number of different HFB configurations exceeds the 
number of processes allocated for the program. 

2.4 Corrected errors 

In the present version (v2.49t), we have corrected the following little significant errors of the 
previous published version (v2.40h) [B]. 

2.4.1 Coulomb energies 

For the combination of the Coulomb parameters IC0UDI=1 and IC0UEX=2 (IC0UDI=2 and 
IC0UEX=1), the direct (exchange) Coulomb energy was inadvertently set to zero. 

2.4.2 Skyrme parameters 

For SLY5 and SLY7, the predefined values of the Skyrme-force parameters corresponded to those 
given in Ref. [57], while for SLY4 they corresponded to unrounded numbers communicated by 
the authors of the force. At present, all these parameters are coded according to Ref. [5B], while 
the previous values are kept in the code under acronyms sLy4, sLy5, and sLy7. 

2.4.3 Quasiparticle blocking 

The time reversal of s.p. states was incorrectly coded for IDSIGB = —1 (in BLOSIG), IDSIMB = —1 
(in BLOSIM) IDSIQB = -1 (in BLOSIQ), IDSIZB = -1 (in BLOSIZ). Moreover, the quasiparticle 
blocking was incorrectly performed in BLOSIZ and, in some situations, these errors compensated 
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one another. One should stress that the calculated results were always correctly converged, but 
they could have corresponded to other blocked quasiparticles as compared to what was requested 
in the input data. 

2.4.4 Yukawa mean fields 

For calculations related to the Yukawa interaction that were not supposed to be using the 
mean fields stored on the disc (IFCONT = 0), in the first iteration the Yukawa mean fields were 
inadvertently set to zero. Since later these mean fields were calculated correctly, the error was 
only affecting the continuation of calculations from the disc, while the converged results were 
correct. 

2.4.5 The Broyden method 

In the subroutine DOBROY, the input parameter ALPHAM was inadvertently reset to 1— SLOWEV 
and thus it was ineffective. 

3 Input Data File 
3.1 Physics 

Keyword: PROJECTISO 

0, 2, 1, l.E-6, 0, = IPRGCM, ISOSAD, NBTKNO, EPSISO, ICSKIP, IFERME 
For IPRGCM> 1 and NBTKNO> 1, the isospin projection is carried out. The isospin projection 
is performed for all values of isospin T such that Tmin < T < Tmin + AT. In the current 
implementation, Tmin = {N — Z)/2, and AT := ISOSAD/2. The number of Gauss- Legendre 
points required to perform integrations over the Euler angle Pt is given by NBTKNO. By setting 
ICSKIP=1, in the projection routines the Coulomb interaction can be switched off, that is, in 
Eq. ([7]) can be neglected. Parameter EPSISO gives Et and controls the number of good- 
isospin states. Parameter IFERME controls the calculation of the Fermi matrix element (fT3|) . 
which proceeds in two independent runs. In the first run, for IFERME=— 1, the wave-function 
\I = 0,T ^ l,Tz = ±1) is computed and stored in the external file under the name specified in 
the keyword WAVEF-FILE. Next, for IFERME=+1, the matrix element ( IT3l) is computed. This run 
uses information on the |/ = 0, T ?a 1, T^ = ±1) state calculated in the first step and supplied 
in the file specified again in the keyword WAVEF-FILE. 

Current restrictions: In version (v2.49t), the isospin projection is only available at the 
Hartree-Fock level, that is, it requires IPAIRI=0. When the full Hamiltonian is re-diagonalized 
(ICSKIP=0), the Coulomb potential must be computed exactly, that is, by expanding both the 
direct and exchange terms onto a sum of Gaussians, see Sec. 2.10 of Ref. [7]. This requires 
setting IC0UDI=2 and IC0UEX=2 in the input. The method also imposes setting IROTAT to 1. 
Note also that IPRGCM> 1 activates either the isospin-, or angular-momentum projection, or 
both (see keyword PROJECTGCM described in [Tj). To run the isospin- or angular-momentum 
projection alone, one needs to set the numbers of integration points NUAKN0=1 and NUBKN0=1 
or NBTKN0=1, respectively. 
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Keyword: COULOCHARG 

1.0 = E_EFFE 

E_EFFE is the factor that multiphes the value of the elementary charge used in the Coulomb mean- 
field. In this way the strength of the Coulomb interaction can be modified or, for E_EFFE=0, the 
Coulomb interaction can be switched off. Note that this factor does not change the strength of 
the Coulomb interaction rediagonalized within the isospin-projection method, see Sec. 12.1.11 

Keyword: FINITETEMP 

0.0 = TEMP_T 

This keyword controls the value of the nuclear temperature (in MeV). If TEMP_T>0, The finite- 
temperature HFB or HF+BCS calculations are performed. 

Keyword: SHELLCORCT 

= IFSHEL 

For IFSHEL=1, the traditional shell correction is computed at convergence from the HF 

(2) 

s.p. energies. If IFSHEL=2, the shell correction ^-RghgU computed, which includes the removal 
of spurious contributions from positive energy states. For IFSHEL=0, shell correction is not 
computed. 

Keyword: SHELL? ARAM 

1.2, 1.2, 4.5, 6 = GSTRUN, GSTRUP, HOMFAC, NPOLYN 
This item adjusts the parameters of the shell correction. The variable GSTRUN (GSTRUP) stands for 
the 7n (7p) smoothing parameter for the neutrons (protons). HOMFAC is the multiplicative factor 
a that defines the energy window for the shell correction, according to ahco, with hoj = Al/A^^^. 
NPOLYN is the number p of Hermite polynomials used in the expansion of the smooth density. 
Preset values are a good choice to use for IFSHEL=1. For IFSHEL=2, the recommended values 
are 7n = 1.54, 7p = 1.66, a = 4.5 and p = 10. 

Keyword: RENORMASS 

0, 0.0, 0.0, 0.0 = IRENMA, DISTAX, DISTAY, DISTAZ 
For IRENMA > 1 the renormalized translational mass is determined in each iteration by using 
components of the shift vector R, DISTAX, DISTAY, and DISTAZ (in fm), in the x, y, and z 
direction, respectively, multiplied by IRENMA, see Eqs. fl55]) and fHU]) . For IRENMA=0, the mass 
is not renormalized. 

Keyword: GAUOVERAPP 

1 = IDOGOA 

For IDOGOA=0 or 1, the translational mass is determined by using the Lipkin method (1381) or the 
GOA expression (HOj) . respectively. However, even for IDOGOA=0 the GOA mass is calculated 
and printed for reference. For IRENMA=0, the value of IDOGOA is ignored. 

Keyword: UNEDF_PROJ 

= IF_EDF 

Setting IF_EDF=1 activates specific parameterizations of the Skyrme energy functional for which 
volume coupling constants are determined automatically from nuclear matter properties (used 
as inputs) and surface coupling constants are preset as usual, following the strategy laid out in 
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3.2 Numerical Methods 



Keyword: TWOBASIS 

= ITWOBA 

For ITW0BA=1, the two-basis method is used to diagonahze the HFB Hamiltonian. ITW0BA=1 
requires IPAHFB=1. The two-basis method is currently implemented only for the no-symmetry 
case; therefore, ITW0BA=1 requires ISIMPY=0 and IPARTY=0. 

Keyword: CUT_SPECTR 

= LIMQUA 

For LIMQUA=1, the HFB quasiparticle energies are calculated only up to the cut-off energy of 
ECUTOF, see Sec. 3.1 in ^. LIMQUA=1 requires IPAHFB=1. 

Keyword: MULTLAGRAN 

0, 0, 0.0, = LAMBDA, MIU, QLINEA, IFLALQ 
For IFLALQ=1, the linear multipole constraint is used in conjunction with the quadratic multipole 
constraint (see keyword MULTCONSTR) to implement the ALM for the total multipole moment 
constraint of multipolarity A and fi. The value of QLINEA is the initial value for the Lagrange 
parameter L^*^. Updates of the parameter in the ALM method are defined by Eq. fHSl) . The 
calculated values of the Lagrange parameters are stored on the record file; this allows for a 
smooth continuation of the ALM method when restarting calculations from disk, see keyword 
CONTAUGMEN. 

For IFLALQ=— 1, only the linear multipole constraint is used for the multipolarity A and fi. 
This option is used together with IF_RPA=1. For IFLALQ=0, linear constraints are switched off. 

Keyword: SURFLAGRAN 

0, 0, 0.0, = LAMBDA, MIU, SLINEA, IFLALS 
This keyword is the exact analog of MULTLAGRAN for surface and Schiff moments, see keywords 
SURFCONSTR or SCHICONSTR) in Sec. 2.4 of l^- Additional values for the flag IFLALS are pos- 
sible: for IFLALS=2 or 3, the ALM is applied only for the neutron or proton (surface-moment 
or Schiff-moment), respectively. The values of IFLALS must be the same for all constrained 
multipolarities. 

Keyword: CONTAUGMEN 

= lACONT 

For IAC0NT=1, the Lagrange parameters L\f^ for the linear constraints will be initialized with 
the values read from the record file. 

Keyword: RPA_CONSTR 

= IF_RPA 

For IF_RPA=1, the Lagrange parameters of the linear constraints will be updated auto- 
matically at each iteration based on the approximation of the RPA matrix. In hfodd version 
(v2.49t), this option is only available for HFB calculations for conserved simplex (ISIMPY=1). 
To be activated, it also requires the flags for linear constraints IFLALQ to be set (see keyword 
MULTLAGRAN). While in the ALM, these flags are all set to 1, in the method based on the RPA 
matrix they must be set to —1. 

Keyword: HFBTHOISON 

0, 0.0 = IF_THO, CBETHO 
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For IF_TH0=1, the code will automatically attempt to perform the requested calculation, first 
with the HFBTHO solver, then by automatic restart with the standard hfodd engine. All options 
specific to hfodd and not implemented in hfbtho will simply be disregarded in this first stage. 
As an experimental feature, it is also possible to restart constrained calculations, in which case 
CBETHO is the value of the deformation used to start hfbtho. 

Keyword: BROYDENMAT 

4, = NOIINP, MIXMAT 

For MIXMAT=1, the iterations of the self-consistent method proceed by mixing the matrix el- 
ements of the HF(B) matrix instead of the HF fields. This option is compatible with both 
IBR0YD=1 and IBROYD=0. It has been noticed that the mixing of matrix elements is less stable 
than the mixing of the fields, unless the Broyden memory is erased every n iterations. The value 
of n is given by NOIINP, and it is recommended to take n less than the number of iterations kept 
in the Broyden memory. 

Keyword: PARA_ALL 

0, 1, 1, 1, 1 = IPAALL, NUBSTA, NUBSTO, NUTSTA, NUTSTO 
For IPAALL=1, calculations of kernels for different values of the Euler angle (3 and the gauge an- 
gle proceed in the same way as those for the a and 7 Euler angles, see keyword PARAKERNEL 
in Sec. 3.2 in [7]. This allows for performing the calculation of kernels in parallel (in differ- 
ent runs of the single-core version of hfodd), and later using the calculated kernels for the 
angular-momentum and isospin projection. Calculations are performed for the nodes in the 
Euler angle (3 from NUBSTA to NUBSTO and for those in the gauge angle /3t from NUTSTA to 
NUTSTO. Values of NUBSTA and NUBSTO must be between 1 and NUBKNO and must be ordered as 
NUBSTA<NUBSTO. Values of NUTSTA and NUTSTO must be between 1 and NBTKNO and must be 
ordered as NUTSTA<NUTSTO. IPAALL=1 requires IPAKER=1. 

Keyword: NUMBKERNEL 

= KFIKER 

For KFIKER>0, the automated procedure of naming the kernel files (see keyword SAVEKERNEL in 
Sec. 3.2 in [7]) is suspended and the kernels are saved in the kernel file carrying the consecutive 
number equal to KFIKER. This requires an explicit bookkeeping of the kernel-file names in the 
input data, but has the advantage of preventing two parallel jobs from accessing the same 
kernel file simultaneously. Only the values of KFIKER between and 999 are allowed. KFIKER>0 
requires IPAKER=1. 

3.3 High-Performance Computing 
3.3.1 List of active keywords in hfodd. d 

In parallel mode, the code hfodd (v2.49t) reads all user-defined sequential data from the input 
file named hfodd. d. Since this version is the first to embed parallel capabilities, many hfodd 
options have not been implemented in parallel mode yet. Only a subset of hfodd keywords can 
therefore be activated, the list of which is given below: 

• Iterations - ITERATIONS, BROYDEN, SLOW_DOWN, SLOW_PAIR, SLOWLIPKIN, ITERAT_EPS, 
MAXANTIOSC, PING_PONG, CHAOTIC, 
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• Specific features - FINITETEMP, SHELLCORCT, HFBTHOISON, SHELLPARAM, COULOMBPAR, 
SKYRME-SET, SKYRME_STD, UNEDF_PROJ, 

• Constraints - OMEGAY, 

• Symmetries - SIMPLEXY, SIGNATUREY, PARITY, ROTATION, TSIMPLEX3D, 

• Pairing - PAIRING, HFB, CUTOFF, BCS, HFBMEANFLD, LIPKIN, PAIR_INTER, PAIRNINTER, 
PAIRPINTER, 

• HO Basis - OPTI_GAUSS, GAUSHERMIT, BASIS_SIZE, SURFAC_DEF, 

• Miscellaneous - ONE_LINE, NILSSONLAB, REVIEW, 

• Restart options - RESTART, CONT_PAIRI, CONTLIPKIN, CONTFIELDS, EXECUTE. 

In principle, these options provide enough flexibility to cover the majority of hfodd applica- 
tions in parallel mode. The user interested in some speciflc option which could not be activated 
by one of the keywords above can still manually modify the routine PREDEF prior to compilation. 
This routine pre-deflnes all hfodd input data. 

3.3.2 Structure of hfodd_mpiio.d 
Keyword: CALCULMODE 

1, = MPIDEF, MPIBAS 

For MPIDEF=1, the code will perform a simple grid calculation of A^p x x A^def points where 
A^p is the number of points along the Z— axis (proton number), the number of points along 
the A^— axis (neutron number), and N^et the total number of constraints on deformations, see 
keyword MULTICONST below. Requires IFC0NS=1. For MPIBAS=1, the calculation grid will be 
given by A^p x A^^ x A"def x A^ho ^ ^ , where A^ho is the number of different oscillator shells in 
the basis, the number of different deformations of the basis, and A^^^ the number of different 
oscillator frequencies (in MeV), see also keyword BASIS-NSHL, BASIS-DEFS and BASIS-FREQ 
below. 

Keyword: CONSTR_DEF 

1 = IFCONS 

For IFC0NS=1, every calculation performed by the code will be constrained on the relevant 
values of the multipole moments. 

Keyword: CONSTR_LIN 

1 = IFLINE 

For IFLINE=1, constrained calculations in multi-core mode are performed with the RPA method 
of re-adjusting the linear constraints, see Sec. 12.2.31 

Keyword: ALL_NUCLEI 

66, 2, 1, 86, 2, 1 = IZSTRT, IZSTEP, NSTPEZ, INSTRT, INSTEP, NSTEPN 
Deflne a vector of proton and neutron numbers 

Z(z) = Z{0) + (z - 1)6Z, ^ = 1, . . . , ATp, N{j) = N{0) + (j - 1)6N, j = l,...,N^. (57) 
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Then, Z(0):=IZSTRT, 5Z:=IZSTEP, Arp:=NSTEPZ, A^(0):=INSTRT, 5A^:=INSTEP, iVni^NSTEPN. 
This defines a (rectangular) subset of nuclei in the isotopic chart for which (possibly A^def > 1) 
calculations will be performed. 

Keyword: MULTICONST 

2, 0, 10.0, 10.0, 4 = LAMBDA MIU, QBEGIN, QFIN, NUMBERQ 
The deformation grid is defined as a set of A^def = H^^iVA^ deformation points where Nx^ is the 
number of points for the constraint on the multipole moment Qx^ with multipolarity A (LAMBDA) 
and fjL (MIU). The point number k for this constraint reads: 

Qam(^) = Qam(O) + -^^^ [Qx^Nx^.) - QxM , (58) 

with (5am(0):=QBEGIN, (5a^(A^a^):=QFIN and iVA^:=NUMBERQ. Multiple constraints arc obtained 
by adding several lines with different A and /i. All such lines must begin with A < except the 
last one. 

Keyword: BASIS-NSHL 

8, 2, 1 = N_MINI, N_STEP, NOFSHL 
For MPIBAS=1, the number of shells in the basis A^ho can take different values of the form: 

A^shell(m) = A^shell(O) + (m - l)5iV3hell, 711=1,..., NjiQ. (59) 

Then, iVshcii(0):=N_MINI, 5A"shcii:=N-STEP, iVHo:=NOFSHL. Note that the number of states is 
set independently as a sequential data under keyword BASIS_SIZE. While the familiar relation 
Astates — (A'sheii + l)(A^sheii + 2)(Asheii + 3)/6 between the number of states and the number of 
HO shells is valid for spherical bases, it does not apply in the deformed case. 

Keyword: BASIS-DEFS 

0.0, 0.1, 1 = B20MIN, B20STP, N0FB20 
For MPIBAS=1, the axial quadrupole deformation /3 = a2o can take different values of the form: 

/3(n) = /3(0) + (n - 1)5/3, n = l,...,Ar^. (60) 

Then, /3(0):=B2OMIN, 5^:=B20STP, A^a:=N0FB20 
Keyword: BASIS-FREQ 

8.0, 0.1, 1 = 0_MINI, 0_STEP, NOFFRE 
For MPIBAS=1, the oscillator frequency hcu can take different values of the form: 

?iuj{l) = ?iu;{0) + {l-l)Suj, l = l,...,N^. (61) 

Then, ^c<j(0):=OJ«NI, 5a;:=0_STEP, A^^:=NOFFRE 

4 Output Files 

Four additional examples of output file, illustrating the new features of the code are provided 
in files ca40_iso . out, cr48_lip.out, cr50_tein. out and pb208_tho . out. Selected lines from 
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ca40_iso.out are given in Appendix B below. Several minor sections of the output have been 
added or reformatted. We describe below only the non-trivial important additions. 

In ca40_iso.out, the section labeled ISOSPIN-MIXED EIGENSTATES lists all the eigenvectors 
(within the er cut-off described in Sec. I2.1.ip of the Hamiltonian re-diagonalized in the good- 
isospin basis. The first two columns are the number n and value -En.r^ of the energy. The next 
5 columns give the characteristics of the expansion of Eq. f|T2l) (or Eq. ffTSj) if isospin and angular 
momentum are combined). The fourth and fifth columns give respectively the number m and 
isospin T of the good-isospin basis state. The columns 6, 7 and 8 give, respectively, the norm 
of the expansion coefficient a^^, its real part and and its imaginary part. 

In cr50_tem. out, the new value I_LINE=3 has been used to display the iterations. Every 
line is made of the iteration number ITER, the value of the total energy ENERGY, the value of 
the stability criterion STABILITY, the total quadrupole moment Q_2 the 7 angle GAMMA, the total 
entropy S, the neutron Fermi level lam_n, the proton Fermi level lam_p, the neutron pairing 
energy EpN and the proton pairing energy EpP. With the values of the Fermi levels, total entropy 
and temperature, the grand canonical potential VL of Eq. ffTOj) can be deduced at each iteration. 

In pb208_tho . out, a warning message is displayed at the beginning to indicate that the 
calculation will proceed in two steps, first with hfbtho then with hfodd. The message also 
gives the conditions under which the restart is expected to be smooth. Follows the output 
generated by hfbtho, we refer to [50] for additional information. Since the UNEDF functional 
is used in this test run, an additional section NUCLEAR MATTER PROPERTIES lists the volume 
nuclear matter characteristics of the functional. Section THE SHELL CORRECTION. . . gives the 
type of shell correction computed and the numerical characteristics of the smoothing procedure. 
Section SHELL CORRECTION located just before the final energy table gives the value of the shell 
correction for neutrons and protons. 

5 Fortran Source Files 

The Fortran source code is provided as a set of 8 module files: 

• hf249t.f - Main source code 

• hf odd_f unctional . f 90 - Definition of energy functionals based on nuclear matter prop- 
erties and coupling constants instead of (t,x) parameters. 

• hf odd_shell . f - Shell correction 

• hfodd_hfbtho.f90 - HFBTHO code (v.lOla) 

• hfodd_interface.f90 - Interface between HFBTHO and HFODD 

• hf oddjupiio . f 90 - Module handling I/O in parallel mode 

• hf odd_mpimanager . f 90 - Module defining parallel jobs 

• hf odd_SLsiz . f 90 - Scalapack module for routine HFBSIZ 

together with one header file, hf odd_sizes .h, which contains all Fortran PARAMETER statements 
controlling the size of static arrays. The language of newly-developed modules is Fortran 90, 
while legacy code is still written, in part or totally, in Fortran 77. 
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5.1 Standard Libraries 

The code hfodd requires an implementation of the BLAS and LAPACK hbraries to function 
correctly, see Sec. 5.2 in [7] for details. While the interface to older NAGLIB routines remains 
available, BLAS and LAPACK will give the best peak performance on most current computers 
and are recommended. 

5.2 Parallel Mode 

We recall that a parallel machine is made of a certain number of sockets, each containing one 
processor. Every processor contains a number of CPU units, or cores, sharing the same memory. 

5.2.1 Basic MPI 

To activate multi-core calculations, hfodd requires an implementation of the Message Passing 
Interface (MPI). The current version was tested on two different implementations: 

• MPICH-1 and MPICH-2, available at: 

http: / / www. mcs. anl. gov / research / pro j ects / mpich2 / 

• Open MPI available at: 'http:/ /www.open-mpi.org/ 

In parallel mode, the code hfodd is compiled by setting USE_MPI to 1 in the project Makefile. 
Typically, the executable is run as follows (bash syntax): 

mpiexec -np [number of processes] hf249t < /dev/null >& hf249t.out 

where hf 249t . out is a redirection for the standard output and files hf odd. d and hf odd_mpiio . d 

must be in the directory where this command is run. 

5.2.2 Hybrid OpenMP/MPI Mode 

Multi-threading is activated by switching the USE_OPENMP to 1 in the project Makefile. This op- 
tion can be used on its own, or in combination with USEjy[PI=l, in which case the programming 
model is hybrid MPI/OpenMP. We recall that to activate multi-threading, the environment 
variable OMP_NUM_THREADS must be set to the required number of threads prior to execution. If 
every processor has 6 cores, then to run 12 MPI processes with 3 threads each, the following 
command line (in the OPENMPI implementation) should be executed: 

export OMP_NUM_THREADS = 3 

mpiexec -np 12 -npersocket 2 hf249t < /dev/null >& hf249t.out 

Therefore, instead of the 12 MPI processes being executed by all the 12 cores of 2 full processors, 
the -npersocket 2 option imposes that only 2 cores within a given socket are actually used, 
leaving the remaining 4 available when multi-threading kicks in. Such an instruction requires 6 
processors instead of 2 in the pure MPI mode, and up to (12 processes) x (3 threads) = 36 cores 
may be active at a given time . 
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5.2.3 Scalapack 



Using Scalapack requires the most advanced partitioning of the core grid. The hbrary can be 
downloaded at: 



http:/ /www.netlib.org/scalapack/ 
It relies on the BLACS framework, which is available at 



http:/ /www.netlib.org/blacs 



To compile the code with the Scalapack library, switch USE_SCALAPACK to 1 in the project 
Makefile. Note that Scalapack requires a multi-core grid and can therefore not be used in serial 
mode: it requires to set USE_MPI to 1 in the Makefile. Optimal performance can be obtained by 
also allowing multi-threading. The syntax of the command line is unchanged compared to the 
basic MPI or hybrid model. However, special care must be taken in the choice of the number 
of cores: the total number of cores is now the product of the number of cores for each HFB 
calculation (size of the Scalapack process grid) by the number of different HFB calculations 
required (defined in hf odd_mpiio . d). 
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A Test Run Input 



This file is part of the official HFODD v2.49t release and demonstrates 
the use of isospin mixing and projection. 



General data 



NUCLIDE IN.FIX IZ.FIX 

20 20 
ITERATIONS NOITER 
100 

ITERAT.EPS EPSITE 

0.000000001 

SLOW.DOWN SLOWEV SLOWOD 

0.5 0.5 

PRINT. ITER IPRSTA IPRMID IPRSTO 





MAXANTIOSC NULAST 
3 

BROYDEN 







IBROYD N.ITER ALPHAM BROTRI 
1 7 0.3 10000.0 



Interaction 

SKYRME-SET SKYRME 

SV 

SKYRME-STD ISTAND KETA.J KETA_W KETACM KETA.M 
10 1 

LANODD XO.LAN X1_LAN GO.LAN GOPLAN Gl.LAN GIPLAN 

1.0 1.0 0.4 1.2 -0.19 0.62 

RHOD LPR TAU SCU DIV 

1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 

SPID LPS CUR KIS ROT 

1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 

RHOD LPR TAU SCU DIV 

1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 

SPID LPS CUR KIS ROT 

1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 



LANDAU 

000 

EVE_SCA_TS RHO 
1. 1 

ODD_SCA_TS SPI 
1. 1 

EVE_SCA_PM RHO 
1. 1 

ODD SCA PM SPI 



PAIRING 
HFB 



IPAIRI 



IPAHFB 




Pairing 
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ROTATION 

SIMPLEXY 

SIGNATUREY 

PARITY 

TSIMPLEX3D 

PHNONE.NEU 
PHNONE.PRO 
DIANON.NEU 
DIANON.PRO 
VACSIM.NEU 
VACSIM.PRO 
VACPAR.NEU 
VACPAR.PRO 

BASIS.SIZE 
SURFAC.PAR 
OPTI-GAUSS 
GAUSHERMIT 
SURFAC.DEF 

OMEGAY 



Symmetries 

IROTAT 
1 

ISIMPY 


ISIGNY 



IPARTY 


ISIMTX ISIMTY ISIMTZ 




PARTICLES 
1 000 

PARTICLES 
1 000 

24 23 

24 23 

SIMP SIMM 

12 12 
SIMP SIMM 

10 10 
PARP PARM 

14 10 
PARP PARM 

14 6 



Configurations 

HOLES 

000 
HOLES 

000 



Parameters of the HO basis 

NOSCIL NLIMIT ENECUT 

10 286 800.0 

INNUMB IZNUMB ROPARM 

20 20 1 . 23 
lOPTGS 
1 

NXHERM NYHERM NZHERM 

26 26 26 

LAMBDA MIU ALPHAR 

-2 0.0 

4 0.0 

Constraints 

OMEGAY 
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OMEGA.XYZ 
MULTCONSTR 



EALLMINMAX 

REVIEWFILE 

RECORDFILE 

REPLAYFILE 

REC.FIELDS 

REP.FIELDS 

COULOMFILE 

REVIEW 

RECORDSAVE 

COULOMSAVE 

FIELD.SAVE 

FIELD.OLD 

RESTART 

CONTFIELDS 

CONT.PAIRI 

EXECUTE 



0.00 

OMEHAX 
0.000 
LAMBDA 
-2 
2 



EMINAL 
-30.0 



OMEHAY 
0.000 
MIU 

2 



EMAXAL 
10.0 



FILREV 
ca40_iso . rev 

FILREC 
ca40_iso .rec 

FILREP 
ca40_iso . rec 

FILFIC 
ca40_iso.f il 

FILFIP 
ca40_iso . f il 

FILCOU 
ca40_iso . cou 

IREVIE 


IWRIRE 

1 

ICOULI ICOULO 

1 1 
IWRIFI 

1 

IWRIOL 
1 



ICONTI 



IFCONT 


IPCONT 




OMEHAZ 
0.000 

STIFFQ 
0.25 
0.25 



ITILAX 



QASKED 
0.200 
0.000 



IFLAGQ 
1 
1 



Output-file parameters 



Files 



Starting the iteration 



Calculate 
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Next run 



ITERATIONS 
SLOW.DOWN 
BROYDEN 
MULTCONSTR 

COULOMBPAR 
RESTART 
CONTFIELDS 
CONT.PAIRI 

EXECUTE 

ITERATIONS 

SLOW.DOWN 

KERNELFILE 

SAVEKERNEL 

PARAKERNEL 

PROJECTGCM 

CUTOVERLAP 

PROJECTISO 

RESTART 

CONTFIELDS 

CONT.PAIRI 



NOITER 

30 
SLOWEV 

0.5 
IBROYD 


LAMBDA 
-2 
2 

ICOTYP 

5 

ICONTI 
1 

IFCONT 
1 

IPCONT 




SLOWOD 
0.5 

N.DUMM 
7 

MIU 

2 

ICOUDI 
2 



ALPHAM 

0.3 
STIFFQ 
0.25 
0.25 
ICOUEX 
2 



BROTRI 
10000.0 

QASKED 
0.200 
0.000 



IFLAGQ 





Calculate 



Next run 



NOITER 
1 

SLOWEV 

1.0 
FILKER 
ca40_iso.ker 
ISAKER 

1 

IPAKER 


IPRGCM 
1 

ICUTOV 
1 

IPRGCM 
1 

ICONTI 
1 

IFCONT 
1 

IPCONT 




SLOWOD 
1.0 



NUASTA 
1 

IPROMI 




NUASTO 
1 

IPROMA 




CUTOVE CUTOVF 
l.OE-05 1.0 
ISOSAD NBTKNO 
10 8 



NUGSTA NUGSTO 



NUAKNO 
1 



EPSISO 
l.D-10 



NUBKNO 
1 



ICSKIP 




KPROJE 




IFERME 




IFRWAV 
1 



ITOWAV 
1 



IWRWAV 
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Calculate 

EXECUTE 

Terminate 

ALL.DONE 

B Test Run Output 

* * 

* SINGLE-CORE VERSION * 

* * 

* * 

* HFODD HFODD HFODD HFODD HFODD HFODD HFODD HFODD * 

* * 

* SKYRME-HARTREE-FOCK-BOGOLYUBOV CODE VERSION: 2.49T * 

* * 

* NO SYMMETRY-PLANES AND NO TIME-REVERSAL SYMMETRY * 

* * 

* DEFORMED CARTESIAN HARMONIC-OSCILLATOR BASIS * 

* * 

* * 

* J. DOBACZEWSKI, B.G. CARLSSON, J. DUDEK, J. ENGEL * 

* J. MCDONNELL, P. OLBRATOWSKI , P. POWALOWSKI, M. SADZIAK * 

* J. SARICH, W. SATULA, N. SCHUNCK, J. A. SHEIKH * 

* A. STASZCZAK, M. STOITSOV, P. TOIVANEN, M. ZALEWSKI * 

* AND H. ZDUNCZUK * 

* * 

* INSTYTUT FIZYKI TEORETYCZNE J , WARSZAWA * 

* LAWRENCE LIVERMORE NATIONAL LABORATORY, USA * 

* * 

* 1993-2011 * 

* * 

* CODE COMPILED WITH THE FOLLOWING ARRAY DIMENSIONS AND SWITCHES: * 

* * 
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* * 

* NDBASE = 680 NDSTAT = 680 NDXHRM = 40 NDYHRM = 40 NDZHRM = 40 * 

* * 

* NDMAIN = 16 NDMULT = 9 NDMULR = 4 NDLAMB = 9 NDITER = 5000 * 

* * 

* NDAKNO = 1 NDBKNO = 1 NDPROI = 20 NDCOUL = 80 NDPOLS = 25 * 

* * 

* NDPROT = 10 NDBTKN =10 * 

* * 

* IPARAL = I.CRAY =0 * 

* * 

* * 

* PRE-PROCESSOR OPTIONS: * 

* * 

* switch_port = 1 switch_diag = 3 switch_cray =0 * 

* * 

* switch_nagl = switch_quad = switch_vect =1 * 

* * 

* * 

* BROYDEN METHOD IS: ON * 

* * 

* TRIGGERED ONLY WHEN STABILITY IS LOWER THAN : 10000.000 MEV * 

* INITIAL SLOWING FACTOR (BEFORE TRIGGER) : 0.50 (=SLOWEV) * 

* BROYDEN SLOWING FACTOR (AFTER TRIGGER) : 0.70 (=1-ALPHAM) * 

* NUMBER OF ITERATIONS RETAINED IN MEMORY : 7 * 

* * 

* * 

* ONLY THE ISOSPIN PROJECTION IS PERFORMED * 

* 8 GAUSS-LEGENDRE KNOTS IS USED * 

* * 
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* CUT-OFF "EPSISO" = 0.00000000010000 ==> GOOD-T BASIS OF DIM = 4 * 

* * 

* * 

* ISOSPIN-MIXED EIGENSTATES * 

* ^ 

* * 

* N EIGENENERGY i T lC_i|"2 Re [C_i] Im[C_i] * 

* * 

* 1 -342.860677 1 0.994703 0.997348 0.000000 * 

* 2 1 0.005296 0.072772 0.000000 * 

* 3 2 0.000002 0.001316 0.000000 * 

* 4 3 0.000000 -0.000061 0.000000 * 

* DOMINANT AMPLITUDE SQUARED EQUALS: 0.9947025220 AND CORRESPONDS TO T = * 

* COULOMB MIXING IN THIS STATE IS: 0.0052974780 [ 0.529748%] * 

* * 

* * 

* 2 -300.253660 1 0.005276 0.072636 0.000000 * 

* 2 1 0.988062 -0.994013 0.000000 * 

* 3 2 0.006659 -0.081605 0.000000 * 

* 4 3 0.000003 -0.001655 0.000000 * 

* DOMINANT AMPLITUDE SQUARED EQUALS: 0.9880618341 AND CORRESPONDS TO T = 1 * 

* COULOMB MIXING IN THIS STATE IS: 0.0119381659 [ 1.193817°/.] * 

* * 

* * 

* 3 -258.016938 1 0.000021 -0.004628 0.000000 * 

* 2 1 0.006609 0.081296 0.000000 * 

* 3 2 0.985082 -0.992513 0.000000 * 

* 4 3 0.008288 -0.091039 0.000000 * 

* DOMINANT AMPLITUDE SQUARED EQUALS: 0.9850815887 AND CORRESPONDS TO T = 2 * 

* COULOMB MIXING IN THIS STATE IS: 0.0149184113 [ 1.491841%] * 

* * 

* * 

* 4 -215.384252 1 0.000000 0.000241 0.000000 * 

* 2 1 0.000033 -0.005784 0.000000 * 

* 3 2 0.008257 0.090869 0.000000 * 

* 4 3 0.991709 -0.995846 0.000000 * 

* DOMINANT AMPLITUDE SQUARED EQUALS: 0.9917092280 AND CORRESPONDS TO T = 3 * 

* COULOMB MIXING IN THIS STATE IS: 0.0082907720 [ 0.829077%] * 

* * 

* * 
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